Model Selection

Diffusion Transformer

# Diffusion Transformer

ICEdit is an innovative instruction-based image editing method that achieves efficient editing through large-scale diffusion transformers, requiring only 0.5% of training data and 1% of parameter scale to achieve SOTA results.

Image Generation Supports Multiple Languages

TextFlux is a high-fidelity multilingual scene text synthesis model based on an OCR-free diffusion transformer. It uses FLUX.1-Fill-dev as the base model and focuses on the scene text synthesis task.

Image Generation

MegaTTS 3 is a zero-shot speech synthesis model based on sparsely-aligned enhanced latent diffusion Transformer, supporting both English and Chinese speech synthesis.

Speech Synthesis Supports Multiple Languages

Dit Wikiart Large

A diffusion transformer model trained on the Wikiart dataset for generating artwork images

Image Generation

Dit Wikiart Small

A diffusion transformer model trained on the Wikiart dataset for generating artistic style images

Image Generation

InfiniteYou (InfU) is an identity-preserving image generation framework based on the FLUX Diffusion Transformer (DiT), capable of flexible image reshaping while maintaining identity features.

Image Generation English

An advanced 3D synthesis system developed by Tencent, capable of generating high-resolution textured 3D assets from images or text

3D Vision Supports Multiple Languages

TransPixar is a text-to-video generation model capable of producing RGBA videos with transparency (alpha channel)

Video Processing

RDT-170M is a 170-million-parameter imitation learning diffusion Transformer model designed for robot vision-language-action tasks.

Multimodal Fusion

Transformers English

robotics-diffusion-transformer

OminiControl is a general-purpose control model based on Diffusion Transformer, focusing on image-to-image tasks.

Image Generation

A 1-billion-parameter imitation learning diffusion Transformer model pretrained on 1M+ multi-robot operation data, supporting multi-view visual-language-action prediction

Multimodal Fusion

Transformers English

robotics-diffusion-transformer

Pixart LCM XL 2 1024 MS

PixArt-LCM is a text-to-image generation model based on the diffusion Transformer, combining the advantages of Pixart-α and LCM. It can quickly generate high-quality images according to text prompts.

Image Generation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase